Text Classification Using the N-Gram Graph Representation Model Over High Frequency Data Streams

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

This paper presents the results of classifying Arabic text documents using the N-gram frequency statistics technique employing a dissimilarity measure called the “Manhattan distance”, and Dice’s measure of similarity. The Dice measure was used for comparison purposes. Results show that N-gram text classification using the Dice measure outperforms classification using the Manhattan measure.

متن کامل

Sentiment Classification over Opinionated Data Streams Through Informed Model Adaptation

Opinionated data streams are very popular data paradigms nowadays as more and more users share their opinions online about almost everything from products to persons, brands and ideas. One of the key challenges for opinionated stream mining is dealing with concept drifts in the underlying stream population by building learners that adapt to such concept changes. Ageing is a typical way of adapt...

متن کامل

Approximate Frequency Counts over Data Streams

Research in data stream algorithms has blossomed since late 90s. The talk will trace the history of the Approximate Frequency Counts paper, how it was conceptualized and how it influenced data stream research. The talk will also touch upon a recent development: analysis of personal data streams for improving our quality of lives. 1. BIOGRAPHICAL SKETCHES Gurmeet Manku (1973-) is a software engi...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Two-step Feature Selection Algorithm Based on N-gram Representation in Chinese Text Classification

Usually, there are two steps in the construction of an automated text classification system. The first step is that the texts are coded into a representation more suitable for the learning algorithm. There are various ways of representing a text such as by using word fragments, words, phrases, meanings, and concepts [82]. Different text representations have different dependence on the language ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Frontiers in Applied Mathematics and Statistics

سال: 2018

ISSN: 2297-4687

DOI: 10.3389/fams.2018.00041